A Mediator for Heterogeneous Gazetteers
نویسندگان
چکیده
Gazetteers are catalogs of geographic objects, typically classified using terms taken from a thesaurus. Mediated access to several gazetteers requires the use of a strategy to deal with the heterogeneity of different thesauri. This paper outlines the implementation of a mediator for heterogeneous gazetteers. The mediator incorporates an instance-based technique to align thesauri that uses the results of user queries as evidences.
منابع مشابه
Exploting multiple heterogeneous data sets for improving geotagging quality
Geotagging is the process of associating with textual data items the geographic position they denote, usually in the form of geographical coordinates (latitude and longitude). Automatic geotagging is often trivial relying on one of the many available gazetteers, such as OpenStreetMap (OSM). However, such knowledge bases are not free of errors, and, while this simple match works for popular loca...
متن کاملTowards Heterogeneous Resources-Based Ambiguity Reduction of Sub-typed Geographic Named Entities
The aim of this work is to nd sub-typed Geographic Named Entities from the analysis of relations between Place Names surrounded nominal group within a speci c phrasal context in a set of textual documents. The paper presents a method involving natural language processing and heterogeneous resources like gazetteers, thesauri or ontologies. The work and the results focus a French language corpus....
متن کاملSemi-supervised learning of geographical gazetteers from the internet
In this paper we present an approach to the acquisition of geographical gazetteers. Instead of creating these resources manually, we propose to extract gazetteers from the World Wide Web, using Data Mining techniques. The bootstrapping approach, investigated in our study, allows us to create new gazetteers using only a small seed dataset (1260 words). In addition to gazetteers, the system produ...
متن کاملEAGER: Extending Automatically Gazetteers for Entity Recognition
Key to named entity recognition, the manual gazetteering of entity lists is a costly, errorprone process that often yields results that are incomplete and suffer from sampling bias. Exploiting current sources of structured information, we propose a novel method for extending minimal seed lists into complete gazetteers. Like previous approaches, we value WIKIPEDIA as a huge, well-curated, and re...
متن کاملNamed entity recognition with document-specific KB tag gazetteers
We consider a novel setting for Named Entity Recognition (NER) where we have access to document-specific knowledge base tags. These tags consist of a canonical name from a knowledge base (KB) and entity type, but are not aligned to the text. We explore how to use KB tags to create document-specific gazetteers at inference time to improve NER. We find that this kind of supervision helps recognis...
متن کامل